Skip to content

Conversation

@AsyaPronina
Copy link
Contributor

Details:

  • Fix LLM inference on NPU for input prompt of length 1

Tickets:

  • N/A

@AsyaPronina AsyaPronina requested review from a team as code owners October 1, 2025 13:43
@github-actions github-actions bot added category: NPU OpenVINO NPU plugin category: NPUW NPUW plugin labels Oct 1, 2025
@dmatveev dmatveev added this to the 2025.4 milestone Oct 1, 2025
Comment on lines 545 to -638
uint32_t num_embeds_dim = 1 - batch_dim;
if (shape[num_embeds_dim] > max_generation_token_len) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

somehow I overlooked it in the past but what is batch_dim? can this 1 - x underflow to some hugely positive value here?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

No, because batch dim is either 0 or 1 (for chat-glm), but great catch!! Let me add assert!

@AsyaPronina AsyaPronina force-pushed the npuw_one_token_prompt_fix branch from 69cb08a to 74744ef Compare October 9, 2025 13:12
@AsyaPronina AsyaPronina changed the title Fix LLMInferRequest to work with the input prompt of len 1 [NPUW] Fix LLMInferRequest to work with the input prompt of len 1 Oct 9, 2025
@dmatveev dmatveev self-assigned this Oct 10, 2025
@dmatveev dmatveev added this pull request to the merge queue Oct 10, 2025
Merged via the queue into openvinotoolkit:master with commit a02f8c4 Oct 10, 2025
211 of 213 checks passed
@dmatveev dmatveev deleted the npuw_one_token_prompt_fix branch October 10, 2025 14:05
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

category: NPU OpenVINO NPU plugin category: NPUW NPUW plugin

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants